A rainbow table is an efficient way to store data that has been computed in advance to facilitate cracking passwords. To protect stored passwords from compromise in case of a data breach, organizations avoid storing them directly, instead transforming them using a scrambling function – typically a cryptographic hash. One line of attack against this protection is to precompute the hashes of likely or possible passwords, and then store them in a dataset. However, such a dataset can become too big as the range of possible passwords grows. Rainbow tables address this problem by storing chains of possible passwords to save space. Undoing the chains takes significant computation time, but overall this tradeoff makes certain classes of attacks practical. Rainbow tables partition a function (the hash), whose

domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined ** Domain of definition of a partial function ** Natural domain of a partial function **Domain of holomorphy of a function * ...

is a set of values and whose codomain is a set of keys derived from those values, into chains such that each chain is an alternating sequence of values and keys, followed by a final value. Each item in the chain is derived from the previous item so that the chain may be algorithmically reproduced from the first value in the chain: A

key derivation function In cryptography, a key derivation function (KDF) is a cryptographic algorithm that derives one or more secret keys from a secret value such as a master key, a password, or a passphrase using a pseudorandom function (which typically uses a cry ...

produces a key from a preceding value, and a reduction function produces a value from a preceding key. The first value and last value of each chain are

precomputed In algorithms, precomputation is the act of performing an initial computation before run time to generate a lookup table that can be used by an algorithm to avoid repeated computation each time it is executed. Precomputation is often used in al ...

and stored, making the chain a row in a virtual table where each even-numbered field contains a value and each odd-numbered field contains the corresponding key. Such a table may be used to discover a secret value (password) given its associated key. It is a practical example of a space–time tradeoff, requiring less processing but using more storage than a brute-force attack which calculates a key on each iteration, but requiring more processing and less storage than a simple table. Use of a key derivation function that employs a

salt Salt is a mineral composed primarily of sodium chloride (NaCl), a chemical compound belonging to the larger class of salts; salt in the form of a natural crystalline mineral is known as rock salt or halite. Salt is present in vast quant ...

makes rainbow tables infeasible for recovering a secret value from a key. Rainbow tables were invented by Philippe Oechslin as an application of an earlier, simpler algorithm by Martin Hellman.

Background

For user authentication, a password is stored either as

plaintext In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. Overview With the advent of comp ...

or as a key produced from an algorithm that usually involves a hash function. Since a password stored as plaintext may be easily stolen by an actor who gains access to storage, a key corresponding to the password is typically stored instead. Thus, no one – including the authentication system – can learn a password merely by looking at stored value. When a user enters a password for authentication, a key is computed for it and then compared to the key stored for that user. Authentication fails if the two keys do not match; moreover, authentication would equally fail if a key were entered directly as a password, since a different key would be computed for it. To learn a password from a key is to find a string which, when input into the key derivation function, creates that same key. This is the same as inverting the key derivation function. Although a brute-force attack (e.g. dictionary attack) might be used to attempt to invert a key derivation function, brute-force becomes infeasible when the set of possible passwords is large enough. An alternative to brute-force is to use precomputed hash chain tables. Rainbow tables are a special kind of such table that overcome certain technical difficulties.

Etymology

The term, ''rainbow tables'', was first used in Oechslin's initial paper. The term refers to the way different reduction functions are used to increase the success rate of the attack. The original method by Hellman uses many small tables with a different reduction function each. Rainbow tables are much bigger and use a different reduction function in each column. When colors are used to represent the reduction functions, a rainbow appears in the rainbow table. Figure 2 of Oechslin's paper contains a black-and-white graphic that illustrates how these sections are related. For his presentation at the Crypto 2003 conference, Oechslin added color to the graphic in order to make the rainbow association more clear. The enhanced graphic that was presented at the conference is shown to the right.

Precomputed key chains

Given a key derivation function K and a finite set of secret values V, the goal is to precompute a map that, given any output ''k'' of the key derivation function, may be used to locate an element ''v'' in V such that K(''v'') = ''k'', or determine that there is no such ''v'' in V. The simplest way to do this is to compute K(''v'') for all ''v'' in V, but then storing the table requires Θ(, V, ''n'') bits of space, where , V, is the size of the set V and ''n'' is the size of an output of K, which is prohibitive for large , V, . Chains are a technique for decreasing this space requirement. To create a chain, a function, called a ''reduction function'' is used. Given a key, it produces a value in V. It is not necessary that the reduction function produce exactly the inverse map of the key derivation function. It is an independent function with the

and codomain of the key derivation function swapped. A ''chain'' of alternating values and keys is formed. For example, where V is the set of lowercase alphabetic 6-character values, and keys were 32 bits long, a chain might look like this: :

\,\xrightarrow;K\;,\mathtt\,\xrightarrow;R\;,\mathtt\,\xrightarrow;K\;,\mathtt\,\xrightarrow;R\;,

To generate the table, choose a random set of ''initial values'' from V, compute chains having some fixed length ''t'', storing the first and last value of each chain as the ''starting point'' and ''endpoint'', respectively. In the example chain above, "aaaaaa" is the starting point and "kiebgt" is the endpoint. Now, given a key ''k'' to invert (find the corresponding secret value for), compute its own chain starting with ''k'' by applying R, then K, then R, and so on, searching at each step for a match between the current value and an endpoint in the table. When a match is found, ''k'' is almost surely in that chain, which may be reproduced from its starting point, and value immediately ''k'' target secret value ''v''. This works because assuming that all possible values are represented in the virtual table whose rows are the chains, then the chain formed from ''k'' corresponds to one of those chains, and iterating through it eventually produces the value that is the endpoint for that chain in the table. For example, given the pair , its chain can be computed by first applying R: :

\mathtt\,\xrightarrow;R\;,

Since "" is one of the endpoints in our table, the corresponding starting pair "" in the chain may be followed until is reached: :

,\mathtt

Thus, the password is "" (or a different password that has the same key). Note however that this chain does not ''always'' contain the key ''k''; it may so happen that the chain starting at ''k'' merges with a chain having a different starting point. For example, the chain of key FB107E70, also leads to kiebgt: :

\mathtt\,\xrightarrow;R\;,\mathtt\,\xrightarrow;K\;,\mathtt\,\xrightarrow;R\;,

But is not in the chain starting at "aaaaaa". This is called a ''false alarm''. In this case, the match is ignored and the chain of ''k'' is extended looking for another match. If the chain of ''k'' gets extended to length ''t'' with no good matches, then the password was never produced in any of the chains. It is not necessary to invert keys to create the table, which is created once and then repeatedly used for the lookups. Increasing the length of the chain decreases the size of the table. However, it also increases the time required to perform lookups, and this is the time-memory trade-off of the rainbow table. In a simple case of one-item chains, the lookup is very fast, but the table is very big. Once chains get longer, the lookup slows, but the table size goes down. Simple hash chains have several flaws, the most serious of which is that if two chains ''collide'' (produce the same value), they will merge and consequently the table will not cover as many passwords despite having the same computational cost to generate. Because chains are not stored in their entirety, this is impossible to detect efficiently. For example, if the third value in chain 3 matches the second value in chain 7, the two chains will cover almost the same sequence of values, but their final values will not be the same. The key derivation function K is unlikely to produce collisions as it is usually considered an important security feature not to do so, but the reduction function R, because of its need to correctly cover the likely plaintexts, cannot be collision resistant. Other difficulties result from the importance of choosing the correct function for R. Picking R to be the identity is little better than a brute force approach. Only when the attacker has a good idea of what the likely plaintexts will be they can choose a function R that makes sure time and space are only used for likely plaintexts, not the entire space of possible passwords. In effect R shepherds the results of prior key calculations back to likely plaintexts but this benefit comes with the drawback that R likely won't produce every possible plaintext in the class the attacker wishes to check denying certainty to the attacker that no passwords came from their chosen class. Also it can be difficult to design the function R to match the expected distribution of plaintexts.

Rainbow tables

Rainbow tables effectively solve the problem of collisions with ordinary hash chains by replacing the single reduction function R with a sequence of related reduction functions R₁ through R_''k''. In this way, for two chains to collide and merge they must hit the same value ''on the same iteration'': consequently, the final values in these chain will be identical. A final postprocessing pass can sort the chains in the table and remove any "duplicate" chains that have the same final values as other chains. New chains are then generated to fill out the table. These chains are not ''collision-free'' (they may overlap briefly) but they will not merge, drastically reducing the overall number of collisions. Using sequences of reduction functions changes how lookup is done: because the key of interest may be found at any location in the chain, it's necessary to generate ''k'' different chains. The first chain assumes the key is in the last key position and just applies R_''k''; the next chain assumes the hash value is in the second-to-last hash position and applies R_''k''−1, then K, then R_''k''; and so on until the last chain, which applies all the reduction functions, alternating with K. This creates a new way of producing a false alarm: an incorrect "guess" of the position of the hash value may needlessly evaluate a chain. Although rainbow tables have to follow more chains, they make up for this by having fewer tables: simple hash chain tables cannot grow beyond a certain size without rapidly becoming inefficient due to merging chains; to deal with this, they maintain multiple tables, and each lookup must search through each table. Rainbow tables can achieve similar performance with tables that are ''k'' times larger, allowing them to perform a factor of ''k'' fewer lookups.

Example

#Starting from the key ("re3xes") in the image below, one computes the last reduction used in the table and checks whether the password appears in the last column of the table (step 1). #If the test fails (''rambo'' doesn't appear in the table), one computes a chain with the two last reductions (these two reductions are represented at step 2) #:Note: If this new test fails again, one continues with 3 reductions, 4 reductions, etc. until the password is found. If no chain contains the password, then the attack has failed. # If this test is positive (step 3, ''linux23'' appears at the end of the chain and in the table), the password is retrieved at the beginning of the chain that produces ''linux23''. Here we find ''passwd'' at the beginning of the corresponding chain stored in the table. # At this point (step 4), one generates a chain and compares at each iteration the key with the target key. The test is valid and we find the key ''re3xes'' in the chain. The current value (''culture'') is the one that produced the target key (''re3xes''): The attack is successful. Rainbow tables use a refined algorithm with a different reduction function for each "link" in a chain, so that when there is a key collision in two or more chains, the chains will not merge as long as the collision doesn't occur at the same position in each chain. This increases the probability of a correct crack for a given table size, at the cost of squaring the number of steps required per lookup, as the lookup routine now also needs to iterate through the index of the first reduction function used in the chain. Rainbow tables are specific to the key function they were created for e.g., MD5 tables can crack only MD5 hashes. The theory of this technique was invented by Philippe Oechslin as a fast form of time/memory tradeoff, which he implemented in the

Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for se ...

password cracker In cryptanalysis and computer security, password cracking is the process of recovering passwords from data that has been stored in or transmitted by a computer system in scrambled form. A common approach (brute-force attack) is to repeatedly tr ...

Ophcrack. The more powerful RainbowCrack program was later developed that can generate and use rainbow tables for a variety of character sets and hashing algorithms, including LM hash, MD5, and

SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160- bit (20- byte) hash value known as a message digest – typically rendered as 40 hexa ...

. In the simple case where the reduction function and the key have no collision, given a complete rainbow table (one that makes sure to find the corresponding password given any hash) the size of the password set , ''P'', , the time ''T'' that had been needed to compute the table, the length of the table ''L'' and the average time ''t'' needed to find a value matching a given key are directly related: :

T = , P,

t = \frac

Thus the 8-character lowercase alphanumeric passwords case (, ''P'', ≃ 3×10¹²) would be easily tractable with a personal computer while the 16-character lowercase alphanumeric passwords case (, ''P'', ≃ 10²⁵) would be completely intractable.

Defense against rainbow tables

A rainbow table is ineffective against one-way hashes that include large salts. For example, consider a password hash that is generated using the following function (where "" is the

concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...

operator): saltedhash(password) = hash(password + salt) Or saltedhash(password) = hash(hash(password) + salt) The salt value is not secret and may be generated at random and stored with the password hash. A large salt value prevents precomputation attacks, including rainbow tables, by ensuring that each user's password is hashed uniquely. This means that two users with the same password will have different password hashes (assuming different salts are used). In order to succeed, an attacker needs to precompute tables for each possible salt value. The salt must be large enough, otherwise an attacker can make a table for each salt value. For older Unix passwords which used a 12-bit salt this would require 4096 tables, a significant increase in cost for the attacker, but not impractical with terabyte hard drives. The SHA2-crypt and

bcrypt bcrypt is a password-hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive func ...

methods—used in

Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...

, BSD Unixes, and Solaris—have salts of 128 bits. These larger salt values make precomputation attacks against these systems infeasible for almost any length of a password. Even if the attacker could generate a million tables per second, they would still need billions of years to generate tables for all possible salts. Another technique that helps prevent precomputation attacks is key stretching. When stretching is used, the salt, password, and some intermediate hash values are run through the underlying hash function multiple times to increase the computation time required to hash each password. For instance, MD5-Crypt uses a 1000 iteration loop that repeatedly feeds the salt, password, and current intermediate hash value back into the underlying MD5 hash function. The user's password hash is the concatenation of the salt value (which is not secret) and the final hash. The extra time is not noticeable to users because they have to wait only a fraction of a second each time they log in. On the other hand, stretching reduces the effectiveness of brute-force attacks in proportion to the number of iterations because it reduces the number of attempts an attacker can perform in a given time frame. This principle is applied in MD5-Crypt and in bcrypt. It also greatly increases the time needed to build a precomputed table, but in the absence of salt, this needs only be done once. An alternative approach, called key strengthening, extends the key with a random salt, but then (unlike in key stretching) securely deletes the salt. This forces both the attacker and legitimate users to perform a brute-force search for the salt value. Although the paper that introduced key stretching referred to this earlier technique and intentionally chose a different name, the term "key strengthening" is now often (arguably incorrectly) used to refer to key stretching. Rainbow tables and other precomputation attacks do not work against passwords that contain symbols outside the range presupposed, or that are longer than those precomputed by the attacker. However, tables can be generated that take into account common ways in which users attempt to choose more secure passwords, such as adding a number or special character. Because of the sizable investment in computing processing, rainbow tables beyond fourteen places in length are not yet common. So, choosing a password that is longer than fourteen characters may force an attacker to resort to brute-force methods. Specific intensive efforts focused on LM hash, an older hash algorithm used by Microsoft, are publicly available. LM hash is particularly vulnerable because passwords longer than 7 characters are broken into two sections, each of which is hashed separately. Choosing a password that is fifteen characters or longer guarantees that an LM hash will not be generated.

Common uses

Nearly all distributions and variations of

Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...

, and BSD use hashes with salts, though many applications use just a hash (typically MD5) with no salt. The Microsoft Windows NT/2000 family uses the LAN Manager and

NT LAN Manager In a Windows network, NT (New Technology) LAN Manager (NTLM) is a suite of Microsoft security protocols intended to provide authentication, integrity, and confidentiality to users. NTLM is the successor to the authentication protocol in Microsoft L ...

hashing method (based on

MD4 The MD4 Message-Digest Algorithm is a cryptographic hash function developed by Ronald Rivest in 1990. The digest length is 128 bits. The algorithm has influenced later designs, such as the MD5, SHA-1 and RIPEMD algorithms. The initialism "MD" st ...

) and is also unsalted, which makes it one of the most popularly generated tables. Rainbow tables have seen reduced usage as of 2020 as salting is more common and GPU-based brute force attacks have become more practical. However, rainbow tables are available for eight and nine character

NTLM In a Windows network, NT (New Technology) LAN Manager (NTLM) is a suite of Microsoft security protocols intended to provide authentication, integrity, and confidentiality to users. NTLM is the successor to the authentication protocol in Microsoft L ...

passwords.

Notes

References

External links

Ophcrack page by Philippe Oechslin
The original rainbow table research * {{cryptography navbox , hash Cryptographic attacks Search algorithms Cryptographic hash functions Hash based data structures